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Identification of a functional 339 bp Alu insertion 
polymorphism in the schizophrenia-associated locus at 


10924.32 


DEAR EDITOR, 


Genome-wide association studies (GWAS) have identified 
multiple single nucleotide polymorphisms (SNPs) or small 
indels robustly associated with schizophrenia; however, the 
functional risk variations remain largely unknown. We 
investigated the 10q24.32 locus and discovered a 339 bp Alu 
insertion polymorphism (rs71389983) in complete linkage 
disequilibrium (LD) with the schizophrenia GWAS risk variant 
rs7914558. The presence of the Alu insertion at rs71389983 
strongly repressed transcriptional activities in in vitro luciferase 
assays. This polymorphism may be a target for future 
mechanistic research. Our study also underlines the 
importance and necessity of considering previously 
underestimated Alu polymorphisms in future genetic studies of 
schizophrenia. 

Schizophrenia is a severe chronic psychiatric disorder with 
high heritability (Sullivan et al., 2003), and depicting the 
genetic architecture of schizophrenia is essential for 
understanding its pathophysiology. So far, GWAS have 
identified numerous risk loci (Schizophrenia Psychiatric 
Genome-Wide Association Study Consortium, 2011; 
Schizophrenia Working Group of the Psychiatric Genomics 
Consortium, 2014), and several studies have attempted to 
identify causative risk variations and underlying biological 
mechanisms from the massive tagged single nucleotide 
polymorphisms (SNPs) (Duan et al., 2014; Huo et al., 2019; 
Wu et al., 2017, 2019 Yang et al., 2018). However, one 
potential limitation of current GWAS platforms is that they 
have primarily focused on SNPs and small indels, ignoring 
other sequence variations that have also been implicated in 
the genetic risk of human disorders including schizophrenia 
(Payer et al., 2017; Song et al., 2018; Yang et al., 2019) and 
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in non-human primates (Liu et al., 2018). For instance, Song 
et al. (2018) previously identified a functional human-specific 
tandem repeat in the CACNA1C gene as a potential causative 
variation for schizophrenia and bipolar disorder. 

The chromosomal 10q24.32 region is a critical locus 
showing genome-wide significant associations with 
schizophrenia. For example, rs7914558 is reported to be the 
most significant SNP in the 10q24.32 region in the PGC1 
GWAS of European populations (P=1.82x10-°, n=51 695) 
(Schizophrenia Psychiatric Genome-Wide Association Study 
Consortium, 2011), and its association with schizophrenia has 
been further confirmed in subsequent GWAS with increased 
sample size (P=3.49x10-', n=79 845) (Schizophrenia 
Working Group of the Psychiatric Genomics Consortium, 
2014). Intriguingly, according to data from a recent GWAS of 
East Asian populations, rs7914558 is also significantly 
associated with schizophrenia genome-wide (P=3.50x10°, 
n=58 140) (Lam et al., 2019). In the present study, through 
population genetic analyses, in vitro luciferase assays, and 
expression quantitative trait loci (eQTL) data, we identified a 
functional 339 bp Alu insertion polymorphism (rs71389983) 
within the 9th intron of the AS3MT gene in complete LD with 
rs7914558. 

The study protocol was approved by the Institutional Review 
Board of the Kunming Institute of Zoology (KIZ), Chinese 
Academy of Sciences (CAS). Informed consent was obtained 
before any study-related procedures were carried out. 
Genotyping of rs71389983 and rs7914558 was conducted 
using polymerase chain reaction (PCR) on 38 European and 
39 Han Chinese subjects, with amplicons analyzed using 
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agarose gel and Sanger sequencing to determine differences 
in alleles. The PCR primers were: 5'- ATGTAACTGGTATATCC 
ATCGCCT-3' (forward) and 5'- AGAAGACT CAAACAGATGAAC 
GGA-3' (reverse) for rs71389983; and 5'-CTCTACTTGCCCC 
CTTACAGC-3' (forward) and 5'-GAACCGTATCAGTAATCC 
AACAGA-3' (reverse) for rs7914558. 

The HEK293T (human embryonic kidney 293T) and U87MG 
(human glioblastoma astrocytoma) cell lines used were 
originally obtained from the Kunming Cell Bank, KIZ, and the 
Cell Bank of Type Culture Collection of the CAS, respectively. 
Both cell lines were checked regularly for mycoplasma 
infection using PCR and microscopy. No cells were found to 
be contaminated during the study. The HEK293T cells were 
cultured in a humidified 5% CO, incubator at 37 °C in DMEM 
basic (Dulbecco's Modified Eagle's Medium) (Gibco, USA) 
supplemented with 10% fetal bovine serum, 1% non-essential 
amino acids, 1% sodium pyruvate, and 1% penicillin- 
streptomycin. The U87MG cells were cultured in a humidified 
5% CO, incubator at 37 °C in MEM (Minimum Essential 
Medium) supplemented with 10% fetal bovine serum, 1% 
sodium pyruvate, 2.2 g/L NaHCO3, and 1% penicillin- 
streptomycin. 

For the reporter gene assays, DNA fragments 
encompassing rs71389983 with either allele were amplified 


from human genomic DNA using primers’ 5'- 
GGCTGCCAGGTTCAAGTAAT- 3' (forward) and 5'- 
CACACTGGAATACTATTCAGACTT-3' (reverse). The 


sequences were then cloned into the pGL3-promoter vector 
(Promega, USA) upstream of the SV40 promoter. The 
recombinant clones were verified through Sanger sequencing 
to ensure they only differed at the rs71389983 locus. The 
pGL3-promoter reporters were transiently co-transfected into 
cells together with the pRL-TK plasmid (Promega, USA) using 
Lipofectamine 3000 (Thermo Fisher Scientific, USA). All 
plasmids were accurately quantified and equal amounts were 
used for transfection. All transfection procedures lasted 36—48 
h, and the cells were then collected to measure luciferase 
activity using the Dual-Luciferase Reporter Assay System 
(Promega, USA). The activity of firefly luciferase was 
normalized to that of Renilla luciferase to control for variations 
in transfection efficiency. All assays were performed with at 
least three biological replicates in independent experiments, 
and statistical analyses were performed by two-tailed t-tests. 
We also examined the impacts of risk SNPs on gene mRNA 
expression using two public RNA-seg brain eQTL datasets, i. 
e., BrainSeq Phase 2 (http://eqtl.brainseq.org/phase2/eqtl/) 
and GTEx (https://www.gtexportal.org/) (Collado-Torres et al., 
2019; GTEx Consortium et al., 2017). Briefly, from the 
BrainSeq dataset, we obtained eQTL data of the dorsolateral 
prefrontal cortex (DLPFC) from 397 individuals, which were 
calculated using linear regression by covarying diagnosis, 
gender, genotyping principal components, and expression 
principal components. From the GTEx dataset, we retrieved 
the eQTL association results from the frontal cortex (BAY) of 
175 subjects, which were calculated using linear regression by 
covarying genotyping principal components, gender, 


genotyping platforms, and additional covariates. 

Recent study has shown that a subset of Alu insertion 
polymorphisms exhibit moderate to strong LD (r°>0.7) with 
GWAS risk SNPs of complex illnesses (Payer et al., 2017). 
We therefore examined whether there were Alu insertion 
polymorphisms within the 10q24.32 region. Using public 
genomic variation databases (i.e., UCSC, http://genome.ucsc. 
edu/) followed by Sanger sequencing of target regions, we 
identified an Alu insertion polymorphism (339 bp) rs71389983 
in intron 9 of AS3MT, which was in complete LD with 
rs7914558 in the Han Chinese and European populations 
(both r?=1.00, Figure 1A). The presence of the Alu insertion at 
rs71389983 was linked with the schizophrenia risk G-allele at 
rs7914558, and therefore may be associated with increased 
risk of schizophrenia. We note that the frequency of 
rs71389983 (and rs7914558) showed divergence between the 
two populations (frequency of Alu insertion at rs71389983: 
0.423 in Han Chinese vs. 0.605 in Europeans). We also 
compared the LD structures of the 10q24.32 region between 
Europeans and East Asians using genotype data from the 
1000 Genomes Project (Genomes Project Consortium et al., 
2015), and found that the LD structures were relatively similar 
across distinct populations, despite showing tiny differences 
(Figure 1A), in agreement with the significant associations of 
this genomic area in both populations. 

The DNA sequence covering rs7914558 is conserved 
across humans and non-human primates, whereas the Alu 
polymorphism rs71389983 appears to be human-unique. We 
thus performed bioinformatics functional prediction of 
rs7914558 using the HaploReg v4.1 dataset 
(https://pubs.broadinstitute.org/mammals/haploreg/haploreg. 
php) (Ward & Kellis, 2012). However, we found that it was 
unlikely located at any DNA segments showing open- 
chromatin peaks or directly binding to transcription factors or 
histone markers (e.g., H3K4me1, H3K4me3, H3K9ac, and 
H3K27ac). On the other hand, Alu insertions have been found 
to affect both transcription and post-transcriptional processes 
(Hasler & Strub, 2006). Considering that rs71389983 was 
found in intron 9 of AS3MT, we hypothesized that it may be 
within the enhancer/repressor region of the genome. To test 
this, we amplified the DNA fragments spanning rs71389983 
from individuals carrying different homozygotes (PCR product 
length: presence of Alu insertion: 589 bp; absence of Alu 
insertion: 250 bp), and then sub-cloned them into the pGL3 
promoter vector. These plasmids were then transfected into 
the human HEK293T and U87MG cell lines, and reporter gene 
assays were carried out to examine their regulatory effects. In 
the HEK293T cells, the transcriptional activity of the pGL3 
promoter containing the Alu insertion at rs71389983 was 
significantly lower than that of the promoter without the allele 
(P<0.000 01, Figure 1B) and that of the empty vector 
(P<0.000 01). In the U87MG cells, this trend was reproduced 
and the presence of the Alu insertion at rs71389983 
corresponded to significantly lower activity of the pGL3 
promoter compared with that of the pGL3 promoter without the 
allele (P<0.000 01, Figure 1B) andthe empty vector (P<0.000 01). 
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Figure 1 Linkage disequilibrium (LD) analysis of rs7914558 and nearby variations (including rs71389983) in European and East Asian 
populations (A); Reporter gene assay testing regulatory activity of rs71389983 in HEK293T and U87MG cells (B); Expression quantitative 
trait loci (eQTL) analyses of rs7914558 with CNNM2 mRNA in BrainSeq and GTEx datasets (C) 

Rs7914558 is located in intron of CNNM2, and Alu polymorphism rs71389983 is located in intron 9 of AS3MT. Both variations are in complete LD in 
both Europeans and East Asians. Effects of rs71389983 allele variation on pGL3 promoter activity in HEK293T and U87MG cells are shown. 
“Comparison group”in figure represents empty pGL3 promoter. Values represent fold change in luciferase activity relative to control pGL3 vector. 
Means and standard deviations of at least three independent experiments are shown. ****: P<0.000 01. 


Therefore, rs71389983 is likely a functional variation and the 
Alu insertion at this locus likely exerts repressive effects on 
transcription. In addition to the consistent trend of the effect of 
the rs71389983 Alu insertion on both cell lines, a slight 
difference between the HEK293T and U87MG cell results was 
observed, as the pGL3 promoter carrying the "absence of Alu 
insertion” at rs71389983 showed higher transcriptional activity 
than the empty vector in the U87MG cells, but lower activity 
than the empty vector in the HEK293T cells. This 
inconsistency could be explained by the different genetic and 
physiological backgrounds between the different cell lines. 

To further confirm the regulatory effects of the Alu 
polymorphism (rs71389983) on gene expression, we 
examined two public RNA-seq eQTL datasets (i.e., BrainSeq 
and GTEx-brain) in human brains (Collado-Torres et al., 2019; 
GTEx Consortium et al., 2017). As rs71389983 is not 
genotyped in those eQTL databases, we used rs7914558 as 
an index SNP. In the BrainSeq dataset, which included 
DLPFC tissues from 397 individuals, the schizophrenia risk G- 
allele at rs7914558 was significantly associated with increased 
gene expression of BORCS7 (P=9.28x10-’), as well as 
decreased mRNA expression of CNNM2 (P=1.07x10-%, 
Figure 1C) and CYP17A1-AS1 (P=7.98x10~%). In the 175 
frontal cortex (BAY) tissues of the GTEx dataset, the G-allele 
at rs7914558 was also strongly associated with increased 
gene expression of BORCS7 (P=1.00*10-°) and decreased 
mRNA expression of CNNM2 (P=0.007 2, Figure 1C), but not 
with the expression of CYP17A1-AS1 (P=0.94). 

Translating the GWAS risk associations of complex 
disorders into biological mechanisms remains an urgent task 
(BareSic et al., 2019; Birnbaum & Weinberger, 2017; Edwards 
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et al., 2013; Forrest et al., 2018; Gandal et al., 2016). 
However, most genetic risk loci are located in noncoding 
regions, which may affect transcription factor binding affinities, 
gene expression, or even cellular physiological processes 
(Duan et al., 2014; Forrest et al., 2017; Li et al., 2011; 
Roussos et al., 2014). We identified a 339 bp Alu insertion 
polymorphism (rs71389983) in the 10q24.32 locus, and 
reporter gene assays showed that different alleles of 
rs71389983 exhibited significantly different regulatory 
activities. The promoter carrying the "absence of Alu insertion" 
at rs71389983 exhibited more than 10-fold higher 
transcriptional activity than the promoter carrying the 
"presence of Alu insertion", suggesting that the Alu insertion 
sequence likely confers function as a gene silencer. Although 
this effect is usually caused by certain epigenetic 
modifications such as DNA methylation or noncoding RNA, 
the current genome-wide sequencing technologies do not 
provide ideal tools for answering this question. For example, 
the ENCODE datasets are mostly based on short DNA reads 
($250 bp) (Encode Project Consortium, 2012), and such 
methods are not able to precisely map retrotransposons, like 
Alu regions, as Alu elements contain multiple highly similar 
sequences (>300 bp) across the genome. Thus, it is difficult to 
identify the epigenetic or regulatory markers at rs71389983 
(as reflected in the UCSC browser, which shows no ChIP-seq 
data at the rs71389983 locus). To resolve this problem, long- 
read sequencing technologies should be applied. 

We found that in both the BrainSeq and GTEx-brain tissues, 
the schizophrenia risk allele at rs71389983 (i.e., its complete 
linked SNP rs7914558) predicted lower expression of CNNM2, 
consistent with the results of our in vitro luciferase assays. 


Therefore, CNNM2 is likely a schizophrenia risk gene, in 
agreement with previous study (Thyme et al., 2019). However, 
the present results do not necessarily mean that rs71389983 
directly regulates CNNM2 expression, unless further functional 
studies (e.g., CRISPR/Cas9 genome editing) are carried out. 
The significant association of risk SNPs (e.g., rs7914558) at 
10q24.32 with BORCS7 expression is also consistent with 
earlier research (Duarte et al., 2016; Li et al., 2016a). 

Previous studies have demonstrated that Alu insertion 
polymorphisms are significantly associated with multiple 
complex human disorders and traits, including multiple 
sclerosis, obesity, height, Alzheimer's disease, breast cancer, 
and blood pressure (Payer et al., 2017). Our recent study also 
identified a functional A/u polymorphism at 3p21.1 affecting 
DNA regulatory activity, which was significantly associated 
with increased risk of psychiatric disorders (e.g., 
schizophrenia, bipolar disorder, and major depressive 
disorder) and cognitive disfunctions (Yang et al., 2019). 
Combined with the present data, these studies suggest that 
such types of sequence variations may play essential roles in 
shaping phenotypes during primate or human evolution 
(Deininger, 2011; Hasler & Strub, 2006). However, our 
previous study showed that the Alu insertion sequence at 
3p21.1 increased regulatory activity (Yang et al., 2019), and 
herein the Alu insertion sequence at 10q24.32 reduced 
transcriptional activities. Although the majority of the Alu 
sequences across the genome show high similarity, their 
functional regulatory effects may be distinct. 

In summary, we discovered a human-unique Alu insertion in 
strong LD with the schizophrenia GWAS risk SNP at 
10q24.32. Schizophrenia is hypothesized to be specific to or 
dominant in humans, and its evolutionary mechanism may be 
related to unique human variations. For example, we 
previously identified a human-specific allele rs13107325 at 
SLC39A8 undergoing Darwinian natural selection, which 
enabled humans to adapt to cold environments in Europe, but 
simultaneously also increased the risk of schizophrenia (Li et 
al., 2016b). The schizophrenia risk allele at SLC39A8 is also 
significantly associated with cognitive function and brain 
structures in human populations (Davies et al., 2018; Elliott et 
al., 2018; Luo et al., 2019; Savage et al., 2018). Assuming that 
the human-unique Alu insertions play pivotal roles in shaping 
humanity, such as development of the dorsolateral prefrontal 
cortex and higher order human features (e.g., higher cognitive 
processing) (Wang & Arnsten, 2015), they may also deliver 
some susceptible or deleterious effects to human health, such 
as predisposition to schizophrenia. Investigations of such 
human-unique variations in non-human primates or other 
species (such as tree shrews) that are evolutionarily close to 
humans or in human-induced pluripotent stem cells (hiPSC) or 
reprogrammed cells via genome editing, may provide novel 
insights into the pathophysiology of schizophrenia and other 
human-dominant disorders (Falk et al., 2016; Hoffman et al., 
2019; Luo et al., 2016; Xiao et al., 2017; Xu et al., 2013; Yao, 
2017). 
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